On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives
نویسندگان
چکیده
Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests those which are consistent without any assumptions about how the distributions may differ (general alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (mean-shift alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the lineartime Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and the first analysis of how tests designed for general alternatives perform against easier ones. Appearing in Proceedings of the 18 International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38. Copyright 2015 by the authors.
منابع مشابه
On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives
Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests those which are consistent without any assumptions about how the distributions may differ (general alternatives), and t...
متن کاملAdaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing
Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing ...
متن کاملTwo-Sample Test of High Dimensional Means under Dependency
This paper considers in the high dimensional setting a canonical testing problem in multivariate analysis, namely testing the equality of two mean vectors. We introduce a new test statistic that is based on a linear transformation of the data by the precision matrix which incorporates the correlations among the variables. Limiting null distribution of the test statistic and the power of the tes...
متن کاملA Linear-Time Kernel Goodness-of-Fit Test
We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein’s method, meaning that it is not necessary to compute the normalising constant of the model. We a...
متن کاملTwosample test of high dimensional means under dependence
The paper considers in the high dimensional setting a canonical testing problem in multivariate analysis, namely testing the equality of two mean vectors. We introduce a new test statistic that is based on a linear transformation of the data by the precision matrix which incorporates the correlations between the variables. The limiting null distribution of the test statistic and the power of th...
متن کامل